This document explores performance of different ensemble methods for forecasting incidence near local peaks.

Performance Near Local Peaks for State Level Incident Cases

We're going to look at how well ensemble methods did for forecasting incident cases near local peaks, in states that have had a local peak recently. For now, this is a manually curated list. All locations with at least two weeks of observed data after a recent local peak are included:

Plots per location

For each of the above locations, we display four plots:

  1. A display of the model weights with the timing of the state-specific peak incident cases indicated with a vertical line. Note that for this model, the estimated ensemble weights are the same across all states. In practice there may be some minor differences across states due to different patterns in missing forecasts from different models; these differences are not shown here.
  2. The mean WIS for the median ensemble and the trained ensemble, averaged across all prospective week-ahead forecasts for which the outcome has been observed at the time of the report generation. Forecasts are subset to the common weeks available for all ensemble methods we considered, which means scores are available for weeks determined by the cut off for the ensemble with a window size of 10 weeks. The horizontal axis is the forecast date (a Monday of submission), not the target end date. For later target end dates, only scores for the shorter forecast horizons are available. The vertical line is located at the timing of the state-specific peak incidence.
  3. The observed incident cases for that state, with a vertical line at the local peak. The observed data are weekly incident case counts as of the Saturday ending each week.
  4. Finally, a separate facetted plot shows the forecasts from each ensemble approach during the two weeks before the local peak, the week of the local peak, and the week after the local peak.

The questions we hope these plots can help answer are:

  1. How does each method perform in the weeks immediately before and after a local peak? There is some variation across locations, but our sense is that immediately before a peak the trained ensemble is generally more aggressive in predicting a continued rise in incidence. This leads to better scores during the rise, but worse scores at the time of the peak. However, there is a quick recovery after the peak and the two methods are generally pretty similar immediately after the peak.
  2. Overall, in the locations where we have seen a peak, what is the relative ranking of the methods? For most locations, the trained approach is better than the median when averaging across all scored weeks. In terms of mean WIS, the improvements during the rise in incidence offset the penalty incurred for over prediction at the peak.